Validity in early theory
Write the first paragraph of your page here. Section heading Write the first section of your page here. Section heading Write the second section of your page here.÷Types of Validity in Early Theory By: Ali Bahrami Islamic Azad university, Research and Sceince Branch ‘Validity’ in testing and assessment has traditionally been understood to mean discovering whether a test ‘measures accurately what it is intended to measure’ (Hughes, 1989: 22), or uncovering the ‘appropriateness of a given test or any of its component parts as a measure of what it is purposed to measure’ (Henning, 1987:170). This view of validity presupposes that when we write a test we have an intention to measure something, that the ‘something’ is ‘real’, and that validity enquiry concerns finding out whether a test ‘actually does measure’ what is intended. These are assumptions that were built into the language of validity studies from the early days, but ones that we are going to question. Three ‘TYPES’ of Validity in Early Theory In the early days of validity investigation, validity was broken down into three ‘types’ that were typically seen as distinct. Each type of validity was related to the kind of evidence that would count towards demonstrating that a test was valid. Cronbach and Meehl (1955) described these as: 1. Criterion-oriented validity Predictive validity Concurrent validity 2. Content validity 3. Construct validity We will introduce each of these in turn, and then show how this early approach has changed. 1. Criterion-oriented validity When considering criterion-oriented validity, the tester is interested in the relationship between a particular test and a criterion to which we wish to make predictions. What we are really interested in here is the criterion, whatever it is that we wish to know about, but for which we don’t have any direct evidence. In this case the validity evidence is the strength of the predictive relationship between the test score and that performance on the criterion. Of course, it is necessary to decide what would count as ‘ability to cope with’ – as it is something that must be measurable. Defining precisely what we mean by such words and phrases is a central part of investigating validity. Predictive validity is the term used when the test scores are used to predict some future criterion, such as academic success. If the scores are used to predict a criterion at the same time the test is given, we are studying concurrent validity. 2. Content validity Content validity is defined as any attempt to show that the content of the test is a representative sample from the domain that is to be tested. This is usually done using expert judges. These may be subject teachers, or language teachers who have many years’ experience in teaching business English. The judges are asked to look at texts that have been selected for inclusion on the test and evaluate them for their representativeness within the content area. Furthermore, the items used on the test should result in responses to the text from which we can make inferences about the test takers’ ability to process the texts in ways expected of students on their academic courses. Carroll (1980: 67) argued that achieving content validity in testing English for Academic Purposes (EAP) consisted of describing the test takers, analyzing their ‘communicative needs’ and specifying test content on the basis of their needs. In early approaches to communicative language testing the central issue in establishing content validity was how best to ‘sample’ from needs and the target domain (Fulcher,1999a: 222–223). 3. Construct validity The first problem with construct validity is defining what a ‘construct’ is. Perhaps the easiest way to understand the term ‘construct’ is to think of the many abstract nouns that we use on daily basis, but for which it would be extremely hard to point to an example. As we use these terms in everyday life we have no need to define them. We all assume that we know what they mean, and that the meaning is shared. For a general term to become a construct, it must have two properties. Firstly, it must be defined in such a way that it becomes measurable. Secondly, any construct should be defined in such a way that it can have relationships with other constructs that are different. To put this another way, concepts become constructs when they are so defined that they can become‘operational’ – we can measure them in a test of some kind by linking the term to something observable (whether this is ticking a box or performing some communicative action), and we can establish the place of a construct in a theory that relates one construct to another (Kerlinger and Lee, 2000: 40). • Test usefulness Bachman and Palmer (1996: 18) have used the term ‘usefulness’ as a super ordinate in place of construct validity, to include reliability, construct validity, authenticity, inter activeness and practicality. They have argued that overall usefulness should be maximized in terms of the combined contribution of the ‘test qualities’ that contribute to usefulness, and that the importance of each test quality changes according to context. Reliability is the consistency of test scores across facets of the test. Authenticity is defined as the relationship between test task characteristics, and the characteristics of tasks in the real world. Inter activeness is the degree to which the individual test taker’s characteristics (language ability, background knowledge and motivations) are engaged when taking a test. Practicality is concerned with test implementation rather than the meaning of test scores. The notion of test ‘usefulness’ provides an alternative way of looking at validity, but it has not been extensively used in the language testing literature. This may be because downgrading construct validity to a component of ‘usefulness’ has not challenged mainstream thinking since Messick. • The validity cline In a series of important papers, Chapelle (1998, 1999a, 1999b) has considered how validity theory has changed in language testing since it was conceived as a property of a test (Lado, 1961: 321). In her work, Chapelle has characterized three current approaches to validity. The first is traditional ‘trait theory’. For our purposes, a ‘trait’ is no different from the notion of a ‘construct’, as used by Cronbach and Meehl. It is assumed that the construct to be tested is an attribute of the test taker. The test taker’s knowledge and processes are assumed to be stable and real, and the test is designed to measure these. Score meaning is therefore established on the basis of correspondence between the score and the actuality of the construct in the test taker. At the other end of the cline is what Chapelle terms the ‘new behaviorism’. In a behaviorist approach the test score is mostly affected by context, such as physical setting, topic and participants. These are typically called ‘facets’ in the language testing literature. In ‘real world’ communication there is always a context – a place where the communication typically takes place, a subject, and people who talk. For example, these could be a restaurant, ordering food and the customer and waiter. According to this view, if we wish to make an inference about a learner’s ability to order food, the ‘real world’ facets should be replicated in the test as closely as possible, or we are not able to infer meaning from the test score to the real world criterion. Fulcher (1995) and Fulcher and Márquez Reiter (2003) have shown that in a behaviorist approach, each test would be a test of performance in the specific situation defined in the facets of the test situation. ‘Validity ‘would be the degree to which it could be shown that there is a correspondence between the real-world facets and the test facets, and score meaning could only be generalized to corresponding real world tasks. References Shohamy, E. (2000) ‘Fairness in language testing.’ In Kunnan, A. J. (ed.) Fairness and Validation in Language Assessment. Studies in Language Testing 9. Cambridge: Cambridge University Press, 15–19. Willingham, W. (1988) ‘Testing handicapped people – the validity issue.’ In Wainer, H. and Braun, H. (eds) Test Validity. Hillsdale, NJ: Erlbaum, 89–103. Fulther, G. and Davidson, F. (2007) Language Testing and Assessment: An Advanced Resource Book. Routledge.